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This paper introduces the Markov-Switching Multifractal Duration (MSMD) 
model by adapting the MSM stochastic volatility model of Calvet and Fisher (2004) 
to the duration setting. Although the MSMD process is exponential /3-mixing as 
we show in the paper, it is capable of generating highly persistent autocorrelation. 
We study analytically and by simulation how this feature of durations generated 
by the MSMD process propagates to counts and realized volatility. We employ a 
quasi-maximum likelihood estimator of the MSMD parameters based on the Whit- 
tle approximation and show that it is a computationally simple and fast alternative 
to the maximum likelihood (estimator, and works for general MSMD specifications. 
Finally, we compare the performance of the MSMD model with competing short- 
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price durations of three major foreign exchange futures contracts. The results of the 
comparison show that the long-memory models perform similarly and are superior 
to the short-memory ACD models. 
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1 Introduction 



Financial durations measure the time elapsed between various financial market events re- 
lated to transactions arrivals, price fluctuations, or trading volumes. Modeling durations 
may be useful for measuring and predicting instantaneous volatility and integrated vari- 
ance and so may aid high-frequency volatility trading and risk management. Exploiting 
the intimate relationship between durations and volatility, Tse & Yang (2010) employ 
parametric duration models to measure daily volatility using high-frequency data. An- 
dersen, Dobrev & Schaumburg (2008) propose a nonparametric duration-based approach 
to measuring volatility by relying on the properties of Brownian motion. More gener- 
ally though, durations are useful for gaining more insight into any information events or 
variables which change values at each tick, as implied by the theory of market microstruc- 
ture, and thus may be useful for examining a number of interesting economic hypotheses 
related to trading and price discovery; see Engle (2000) for an excellent discussion. 

A key stylised fact noted in the empirical irregularly-spaced event literature is long 
memory in financial durations. Ever since the seminal contribution of Engle &; Russell 
(1998), who introduced the first time-series model for financial durations, a number of 
studies have documented the slowly decaying autocorrelation function of transaction, 
price and volume durations; see Pacurar (2008) for a detailed literature review. Deo, 
Hsieh k, Hurvich (2010) recently test for long memory in durations and the associated 
counts and find significant evidence to support the presence of long memory in durations. 
Despite this empirical regularity, there is currently no paper that explores the alternative 
models for capturing the persistent autocorrelations of durations and its implications for 
forecasting. We aim to fill this gap. 

Inspired by the success of the Markov-Switching Multifractal (MSM) stochastic volatil- 
ity model of Calvet &: Fisher (2004) in forecasting persistent volatility of financial returns, 
we start by adapting the MSM model to the duration setting, calling the new model the 
Markov-Switching Multifractal Duration (MSMD) model. This model adds to the class 
of stochastic durations models of Bauwens &: Veredas (2004) and Deo et al. (2010), which 
also evolved from the stochastic volatility literature, though the latent process driving 
the dynamics of durations in an MSMD is a Markov chain rather than a Gaussian 
AR(FI)MA process. We show that although the MSMD process is exponential /3-mixing 
and short-memory, it may exhibit a slowly decaying autocorrelation function over a wide 
range of lags. This long-memory feature of the process is induced by regime switching 
of heterogenous persistence: the process is driven by k independent Markov-switching 
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processes with different, tliougli tiglitly parametrized, transition probabilities. 

Relying on the recent results of Deo, Hurvich, Soulier & Wang (2009) on the propa- 
gation of memory from durations to counts and realized volatility, we establish formally 
that the short memory of MSMD durations translates into short memory in counts and 
realized volatility. However, within the simple pure-jump model of Oomcn (2006), we 
show by simulation that despite being a short-memory process, the MSMD model is 
capable of generating highly persistent realized volatility. Intuitively, the MSMD model 
lies "between" the short-memory ACD model of Engle & Russell (1998) and the Long- 
Memory Stochastic Duration (LMSD) model of Deo et al. (2010) in the sense that its 
autocorrelation function can decay much slower over a wider range of lags than that 
of the ACD model, but eventually assumes an exponential rate of decay, unlike the 
autocorrelation function of the LMSD model. 

Second, we propose quasi-maximum likelihood estimation of the MSMD parameters 
based on the Whittle approximation. The large number of duration observations typi- 
cally available to the econometrician render the computationally intensive exact maxi- 
mum likelihood estimation (MLE) relatively demanding. Moreover, exact MLE is limited 
to the case of an MSM specification with a finite number of states, while for the case of 
an infinite state space one has to resort to simulated maximum likelihood via the particle 
filter. Contrary to this, the Whittle estimator works in either case and is computationally 
simple and fast. Computational speed is not a mere convenience in our context: given 
the increasing importance of algorithmic and high-frequency traders, who are capable of 
generating tens of thousands of limit and market orders in a single day, the amount of 
data usable for estimation has grown enormously in many markets (Hasbrouck & Saar, 
2010). For such environments, fast estimation methods simply become a necessity, even 
with ever-faster modern computers. Last but not least, the Whittle estimator can be 
easily adapted to the original MSM stochastic volatility model of Calvet & Fisher (2004), 
and thus represents a contribution to the MSM literature that goes beyond the context 
of financial durations. 

Finally, we compare our estimation and forecasting results with those possible from 
established duration models. As noted by Pacurar (2008), there is a scarcity of compar- 
isons of duration models, and ideally one would like to undertake a comparison of all the 
models she has detailed. However, as noted above, only long memory models are able 
to account for the key stylised fact of long-range dependence in durations. We therefore 
restrict attention to the LMSD model of Deo et al. (2010). To investigate the benefits of 
the relatively complicated MSMD and LMSD models over their simple and easy to esti- 
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mate short-memory counterparts, we also compare our results with those from the widely 
used Autoregrcssive Conditional Duration (ACD) model introduced in the seminal paper 
by Engle &: Russell (1998). We implement the models on price durations of three major 
foreign exchange futures contracts traded on the Chicago Mercantile Exchange (CME) 
in the period between 9 November 2009 and 29 January 2010: Euro, Japanese Yen, and 
Swiss Franc. We find that while the LMSD and MSMD models deliver generally similar 
forecast performance, both significantly outperform the ACD model individually, as well 
as when equally weighted. 

The Markov switching multifractal duration model has been proposed independently 
and in parallel to our work in a recent paper by Chen, Diebold & Schorfheide (2012) 
(henceforth CDS). While the main thrust of CDS is the same - the application of the MSM 
process of Calvet & Fisher (2004) to financial durations - there are several differences 
that distinguish the two papers. CDS motivate the MSMD model by the mixture-of- 
distributions hypothesis, whereas our main motivation lies in the long-memory features 
of the MSM process, and the relationship between persistence of durations and realized 
volatility. We are not restricting attention to the binomial MSMD model with exponen- 
tially distributed innovations, but consider more general versions of the model. Allowing 
for a wider class of distributions is made possible in practice by employing the Whittle 
estimator, and it turns out to be empirically beneficial. In terms of empirical appli- 
cation, we differ from CDS by modeling and forecasting price durations as opposed to 
transactions durations, and focus on foreign exchange futures prices in 2009/2010 rather 
than individual equities in 1993. Finally, given the high persistence of the durations in 
our sample, the natural competitor of the MSMD model is the LMSD model rather than 
the short-memory ACD, and hence, unlike CDS, we include the LMSD model in our 
forecasting exercise as well. 

The rest of the paper is organized as follows. Section 2 introduces the MSMD model 
and discusses its properties. Section 3 discusses estimation and forecasting for the MSMD 
model. Section 4 reviews the competing duration models and Section 5 looks at the link 
between durations, counts and realized volatility. In Section 6 we describe the data and 
in Section 7 we present estimation and forecasting results. Section 8 concludes. 

2 The MSMD Model 

Let Xi = ti — ti-i denote the duration between two event arrival times. The three most 
common events studied in the literature relate to transaction arrivals, price changes and 
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transaction volumes. The Markov-Switching Multifractal Duration (MSMD) model is 
defined by: 

Xi = ^iei, ei~D(6>e), (2.1) 
where ipi is the Markov-switching multifractal process of Calvet & Fisher (2004); 



^i = ^\ \{MJ^^y (2.2) 

and ei is a sequence of independent unit-mean innovations identically distributed ac- 
cording to some parametric distribution D(0e), where 0^ denotes a vector of parame- 



ters. The latent process in (2.2) is determined by k independent unit-mean multipliers, 
Mj^i,j = l,...,k, and a scaling constant, ip. At every point in time i, each multiplier 
Mj^i takes, with probability jj, a new value M drawn from a common distribution Fm, 
and remains unchanged with probability 1 — 7j : 



M where M is drawn from Fm with probability 7j 
Mj^j_i with probability 1 — 'jj 



The transition probabilities are parsimoniously parametrized by: 

= I - {I - j = l,...,fc, (2.3) 

where 7^ G (0, 1) and b € (1, 00). Two specifications for the distribution of the multipliers 
Fm have been proposed by Calvet & Fisher (2004) - binomial and log- normal. In the 
binomial specification, each multiplier, if at all, is renewed by drawing the values rriQ and 
2 — mo; ruQ G (1,2), with equal probability, ensuring that the mean is equal to one. The 
transition matrix associated with each multiplier is thus given by: 



1 - hi hi 

hj 1 - hj 



Since the multipliers are independent, the transition matrix of the state vector Mi = 
{Mi^i, ...,Mk,i) is simply: 

P = Pi0P2^---(S)Pk, (2.4) 

where "(8)" denotes the Kronecker product. The dimension of the transition matrix is 
2^ X 2^ and the state vector takes values in the finite state space = {"lo? 2 — rriQ}^. 
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In general, any distribution with positive support can be used to model the multi- 
pliers. For example, the log-normal specification of Calvet & Fisher (2004) replaces the 
Bernoulli distribution by a log-normal one, i.e. upon switching, the new value of the log 
multiplier is drawn from N( — A, 2A), where the parameter restriction again imposes unit 
means for the multipliers. When drawn from a continuous distribution (with respect to 
Lebcsguc measure on each multiplier assumes a new value with probability one, 

and the transition kernel of the multiplier is given by 

nMJ,^^h G B,\Mj^i = X,) = (1 - (1 - ljf)nM G B,) + (1 - j = 1, K 

(2.5) 

for any Bj G H(M+) and Xj G M+, where i3(M+) is the Borel cr-algebra on M+. Since the 
multipliers are independent, the transition kernel of the chain Mj reads 

k 

P(Mi+ft G B\Mj^i = x) = ]\ ¥{Mj^i+h G Bj\Mj^i = Xj), 

for any x = {xi,X2, ...,,Tfc)' and any B G B{M.^), a Borel cr-algcbra on M^, where B = 
Bi X B2 X ■ ■ ■ X Bk, Bj G ;B(M_)_), j = 1, k. The chain takes values in a state space 

Having specified the law governing the multipliers, it remains to choose a distri- 
bution for the innovations, ej. As is common in the literature, we consider here the 
exponential and Weibull distributions. With these specifications of ej, the law governing 
the durations, mixture of exponentials and a mixture of Weibull distributions, 

respectively. Imposing a unit mean, the corresponding densities are: 

fE{e) = exp(-e) 

fwie;K) = nC^e^'^expi-^^e^), = r(l + l/«). 

For K = 1, the Weibull distribution reduces to the exponential distribution with unit 
mean. Other, more flexible multi-parameter alternatives have been proposed in the 
context of modeling financial durations: the Burr distribution (Grammig & Maurer, 
2000) and the generalized gamma distribution (Lunde, 1999), both of which encompass 
the exponential and Weibull cases. As we are primarily interested in point forecasts 
in this paper, for the sake of parsimony we confine our attention to the latter two 
distributions. 

To illustrate the behavior of the multipliers and durations in the MSMD model, we 
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plot in Figure [T] simulated samples from the binomial and log-normal MSMD processes 
with k = 6 multipliers and parameters 6 = 3, 7^ = 0.5, mo = 1.4 and A = 0.15. In this 
MSMD specification, 71 = 0.0028 implies that the most persistent multiplier, (Mi^j), 
switches, on average, around 3 times in a sample of 1,000 observations if it is drawn from 
the log- normal distribution, and 1.5 times if it is drawn from the Bernoulli distribution. 
The least persistent multiplier, (Mg^j), switches with probability 0.5 and 0.25 in the log- 
normal and binomial MSMD specifications, respectively. Clearly, both specifications can 
produce rich dynamics: the duration process is highly persistent but can exhibit sudden 
erratic movements as observed in empirical data. 



2.1 Stationarity, ergodicity and strong mixing 

It is relatively easy to establish that the Markov chain Mi driving the MSMD process 
is geometrically ergodic as long as the conditions b > 1 and < 7^ < 1 are satisfied. 
Starting with the binomial MSMD specification, we see that under these conditions all 



elements of the transition matrix of the chain (2.4) are strictly positive since < 7^ < 1 
for all j, and it follows directly from the proof of Theorem 1 in Shiryaev (1995, Chapter 
1, Section 12) that the chain is geometrically ergodic. The ergodic distribution is given 
by 7^^ = l/2^ / = !,..., 2^=. 

If upon switching the multipliers, Mj^i, j = l,...,k, are drawn randomly from a 
continuous distribution, Fm, with support M+, the transition kernel associated with the 



j-th multiplier is given in equation (2.5) and the ergodic distribution of the multiplier 
reads 7r{Bj) := lim/,^00 IP(Mj- E Bj\Mj^i = xj) = F{M E Bj),Bj E B{R+). Then for 
any xj E M+, j = 1, k and /i E N, 

sup |P(Mj-i+/, E Bj\Mj^i = Xj) - 7t{Bj)\ < (1 - 7)^ 

BjGB(K+) 

where < 7 := min{7i, ...,7fc} < 1. Since the multipliers Mj^i, j = l,...,k, are indepen- 
dent it follows that the chain Mi is geometrically ergodic. 

Geometric ergodicity of the Markov chain Mi in turn implies that the duration 
process {xi} is strictly stationary /3-mixing with an exponential rate of decay, provided 
that the chain is initialized from the ergodic distribution. To see this, observe that the 
duration process belongs to the class of generalized hidden Markov models in the sense 
of Definition 3 in Carrasco & Chen (2002): the hidden Markov chain Mi is strictly 
stationary and, conditionally on Mi, the durations Xi are independently distributed 
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where the conditional distribution only depends on Mi and not on i. Given geometric 
ergodicity of the hidden chain, Proposition 4 of Carrasco & Chen (2002) then implies 
that the duration process is exponential /3-mixing. 

2.2 Moments and autocorrelation function 

In Appendix A.l we show that the first two moments of the MSMD process are given by 

E{xi) = V^, (2.6) 
Var(:E,) = ^^[E{M^fE{e^)-l]. (2.7) 

The model can exhibit both under- and over-dispersion depending on the distributional 
assumptions about M and ej, since the ratio of the variance to the squared mean, 
E(M^)'^E(e?) — 1, can in general be smaller or larger than one. An MSMD process 
with exponential innovations, however, always exhibits over-dispersion since for an ex- 
ponentially distributed e^, we have E(e?) = 2, and by construction E(M^) > 1. 

An attractive property of the MSMD model is that it possesses a very flexible autocor- 
relation function (ACF) that can exhibit behaviour similar to long-memory. Appendix 
A.l shows that for a general MSMD process with finite E{M^) and E(ef), we have: 

k 

Cov(xi, Xi_h) = m [1 + Var(M)(l - 7^)'^] - l] . (2.8) 

Now it follows directly from Proposition 1 in Calvet &: Fisher (2004) that the autocor- 
relation function of the MSMD durations decays hyperbolically over a large range of 
lags before transitioning smoothly into exponential decay. Formally, take two arbitrary 
numbers, ai and 02 in (0, 1), and let = {n : ai logij{b^) < logj, n < 02 logb(6'^)} denote 
a set of integers containing a wide range of lags. Then 

logCorr(xi,Xi+n) _ 

1 A ^ 

logn " 

as A; — )• 00, where 6 = log^(E(M^)/[E(M)]^). So despite being a short-memory process, 
the MSMD model can mimic the persistence of a genuine long-memory process with a 
hyperbolically decaying autocorrelation function. 

For illustration purposes. Figure [2] plots the autocorrelation function of a binomial 
MSMD process with exponential innovations and various sets of parameter values. We 



sup 

n6/fc 
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take the case of k = 8 multipliers and parameters 6 = 2, 7^ = 0.5 and rriQ = 1.4, as 
a benchmark and vary each parameter separately to study how it affects the shape of 
the autocorrelation function. Increasing b or decreasing 7^ both increase the persistence 
of the process since the switching probabilities of the multipliers decrease (panels (a) 
and (b)). In the former case, the increase is more pronounced at the long end of the 
ACF, while in the latter case it affects the short lags of the ACF more. This is due 
to the different impact of a change in b and 7^ on the various switching probabilities 
as illustrated in panel (a). Increasing the volatility of the multipliers by reducing mo 
lowers the multipliers' persistence and thus the persistence of the MSMD process (panel 
(c)). Finally, increasing the number of multipliers (k) while keeping the parameters of 
the model fixed increases persistence (panel (d)). 

2.3 Exogenous and predetermined variables 

Exogenous or predetermined variables can be easily incorporated into the model by 
setting ijj = ipi = exp(/3o + (3'zi), for some vector of variables Zj. This is useful for several 
reasons. First, to incorporate the deterministic intraday duration pattern observed in 
most durations data (Engle & Russell, 1998, Bauwens &: Veredas, 2004, Fernandes & 
Grammig, 2006 and Deo et al., 2010, among many others). Due to the deterniinistically 
varying trading activity during the day, the durations tend to be shorter during the 
early and late trading hours, and relatively longer over lunchtime. Second, one may 
wish to include additional predictive variables to enhance the forecasting power of the 
model. A natural candidate when forecasting price durations may be option-implied 
volatility for which high-frequency data is either available readily (e.g. VIX) or can be 
constructed from high-frequency options data. Finally, it may be interesting to include 
some predetermined variables related to market microstructure as in Engle & Russell 
(1998), Bauwens & Veredas (2004) and others. 

3 Estimation, inference and forecasting 
3.1 Maximum likelihood and optimal forecasting 

The binomial MSM with finite k implies a finite number of states of the hidden Markov 
process and hence can be estimated by exact maximum likelihood (MLE) via Bayesian 
updating. This has been advocated by Calvet & Fisher (2004) for the binomial MSM 
model of stochastic volatility, and has been shown to work well for sample sizes typically 
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used for estimating models of time-varying volatility. Moreover, the Bayesian filter al- 
lows for estimation of the unobserved state probabilities, which in turn permits optimal 
forecasting. To save space, we omit the details here and refer the reader to Calvet k. 
Fisher (2004). 

A disadvantage of the exact maximum likelihood estimator is that it becomes com- 
putationally demanding for k > 10, since the dimension of the transition matrix grows 
at a rate of 2^. Also, it is not applicable to the log- normal MSM process, where the state 
space of the hidden Markov chain is infinite. These issues have motivated Lux (2008) 
to develop a generalized method of moments (GMM) approach, which works for a wide 
range of MSM specifications and requires only moderate computational resources. The 
drawback of the GMM estimator of Lux (2008) is that it is applied to the first differences 
rather than levels of the process and this makes the identification of the parameters b and 
7fe difficult even when the sample size is very large. Lux (2008) circumvents this problem 
by setting these parameters to some pre-specified values that seem to work well for a 
number of data sets, and estimates by GMM the remaining two parameters only. This 
may be quite restrictive, however, especially in our context where no previous evidence 
exists to suggest reasonable values of h and for modeling and forecasting financial 
durations. 

3.2 Whittle estimation 

We propose an alternative autocovariance-based estimator of the MSMD parameters. 
In contrast to Lux (2008) we work in the frequency domain and employ the Whittle 
quasi-likelihood. To obtain better finite-sample properties, we implement the Whittle 
estimator on logarithmic durations, as the logarithmic durations are much closer to being 
Gaussian than the durations themselves. It is well-known that for a stationary Gaussian 
process maximizing the frequency domain representation of the log-likelihood turns out 
to be asymptotically equivalent to the usual maximum likelihood estimator (Whittle, 
1962). The so-called negative Whittle log-likelihood is given by 




(3.1) 
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where f{uji; 9) is the model spectral density and 



E 



is the periodogram of the observations xi,X2, •••,Xn, both evaluated at the i-th Fourier 
frequency, cjj = Ini/n. The Whittle estimator of 9 is obtained by minimizing Qn{9): 

9n = argminQn(^), 

Now if the process is not Gaussian, which is our case, minimizing the negative Whittle 
log-likelihood still works but the resulting estimator is no longer asymptotically equiv- 
alent to MLE. The intuition for 0„ in the non-Gaussian case is straightforward: under 
a mixing assumption, the periodogram In{oJi) is asymptotically distributed as an expo- 
nential random variable with parameter f{u}i), and for any two Fourier frequencies, 
and LOj, i ^ j, In{^i) and Inis^j) are asymptotically independent (Rice, 1973). Hence 



(3.1) has a quasi- likelihood interpretation and 0„ has been shown to be consistent for 9 
and asymptotically normally distributed under appropriate regularity conditions. 

The application of the Whittle estimator closest to ours is Zaffaroni (2009), who 
focuses on exponential stochastic volatility models. The logarithmic MSMD is also a 
signal-plus-noise model featuring a spectral density that cannot be easily factored as 
in the case of linear processes in the sense that the Whittle log-likelihood cannot be 
expressed as a sum of two components that depend on disjoint parameter sets. We 
conjecture that the asymptotic theory for the short-memory case in Zaffaroni (2009) can 
be adapted to the Whittle estimator of the MSMD model, although a rigorous proof 
of asymptotic normality awaits future research. Consistency, however, follows relatively 
easily by adapting the proof of Theorem 1 in Zaffaroni (2009), in view of the fact that the 



spectral density (3.2) is continuously differentiable and bounded from below by a positive 
constant, and the MSMD process is exponential /3-mixing as established in Proposition 
1 below. 

Implementing the Whittle estimator for the MSMD model is easy since the spectral 
density is available in closed form. In Appendix A. 2. we show that provided the loga- 
rithmic MSMD durations possess finite second moments, their autocovariance function 
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reads: 



Gov (log Xi,logx 



i-h) 



A;Var(log M) + Var(log e^) if h = 0, 

Var(logM)(E)^,(l-7,)H) if h^O, 



from which the spectral density of the logarithmic MSMD process can be readily com- 
puted via the Fourier transform (see Appendix A. 3). It reads: 

Var(logM)/^ V Var(log^ 

27r 1 ^ 1 + (1 -7j)2 - 2(1 -7j)cosw I 27r ' ^ ^ 

for uj G [— vr, vr], and 9 is the vector of parameters of the MSMD process. 

A drawback of the Whittle estimator compared to MLE is that it cannot in general 
work with multi-parameter distributions for the multipliers Mj^i and innovations 6^; it is 



clear from (3.2) that the Whittle estimator can only identify Var(logM) and Var(logei). 
In addition, the Whittle estimator cannot identify the mean of the durations process, ^jJ, 
but this can be readily estimated by the sample mean if needed. 

Before moving onto linear forecasting in the MSMD model, it is interesting to note 
that the autocovariance function of the logarithmic MSMD process is equivalent to that of 
a signal-plus-noise model, in which the signal is a sum of k independent AR(1) processes: 

k 

Zi = "^VjA + Vi, (3.3) 

yj,i = Pjyj,i-i + (3.4) 

parametrized by: 

Pj = l-7i, j = l,...,k (3.5) 

al^^ = Var(logM)(l-(l-7,)2), j = l,..,k (3.6) 

4 = Var(logei) (3.7) 

In view of the work of Granger (1980) on aggregation of short-memory processes of 
heterogenous persistence, it is hardly surprising to find highly persistent logarithmic 
durations generated by the MSMD process. 
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3.3 Linear forecasting 

When optimal forecasting discussed in the previous subsection is not feasible due to 
the dimensionality of the state space, Lux (2008) suggests using best linear forecasts 
(e.g. Brockwell & Davis, 1991). This forecasting rule only requires the knowledge of the 
autocovariance function of the model and thus works as long as one has a set of consistent 
parameter estimates at hand, regardless of the estimation method used to obtain them. 
Formally, an /i-step ahead forecast based on the most recent n observations, denoted by 
Xn+h\m is obtained from 

n 

Xn+h\n = XI ^nj^n+l-j = 4>^n^Xn, (3.8) 

where the vector of weights (^n^ is a solution to Tn(\>"n'^ = in which c'n'' = {c{h),c{h+ 
1), ...,c(n + h — 1))' denotes the vector of autocovariances of the true process from lag 
h to lag n + h — 1, and P„ = {c{i — j)}ij=i....,n is the variance-covariance matrix of 
Xn = ixi,X2, ...,Xn)- The autocovariance function of the MSMD process is provided in 



(2.8) and the weights cf)^^'^ can be efficiently calculated using the generalized Levinson- 



Durbin algorithm developed by Brockwell & Dahlhaus (2004). 
3.4 Specification testing 

To test the goodness of fit of the MSMD model, we employ the specification test of 
Chen & Deo (2004). The idea of the test is to compare the estimated model's spectral 
density with the smoothed periodogram of the data. Under the null hypothesis of correct 
model specification, the two should be close. The main advantage of this approach is 
that the test statistic does not require residuals, which makes it particularly suitable for 
specification testing of stochastic durations models. 
The test statistic is given by 

where 



1=0 J V J' / \h\<n 



A; is a symmetric kernel function with A;(0) = 1, and p„ is a bandwidth parameter. 
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Provided that is -y/n-consistent, Chen &: Deo (2004) show that under some regularity 
conditions: 



In our implementation, we use the Bartlett kernel and following Chen & Deo (2004), set 
the bandwidth according to pn = Sn*^'^. We check by simulation the performance of the 
test for the MSMD model with this choice of kernel and bandwidth and find that the 
simulated size is very close to the nominal level. The simulation design is the same as 
in the next section and the results are available upon request. 

3.5 Simulations 

Before taking the model to the data it is worthwhile exploring the finite-sample prop- 
erties of the maximum likelihood and Whittle estimators. To do that, we run a simple 
Monte Carlo experiment for the binomial and log-normal MSMD models with k = S 
multipliers and either exponential or WeibuU innovation^ Following Lux (2008) we set 
the parameters of the MSM process as 6 = 2, 7^ = 0.5, mo = 1.4 (binomial) and A = 0.15 
(log- normal) , and the parameter in the Weibull distribution of innovations as k = 1.45. 

Due to the computational burden associated with the exact maximum likelihood 
estimator, the number of Monte Carlo replications for MLE is limited to 500, 250, and 
100 replications for n = 1000, 2500, and 5000, respectively. All simulation results for 
the Whittle estimator are based on 1,000 replications, and we also consider very large 
samples of 10,000 observations, as the application of the Whittle estimator to the MSMD 
model is new and the large-sample properties have not been investigated by simulation 
before. 

Table [l] summarizes the simulation results. Starting with the maximum likelihood 

^In an earlier version of the paper we also reported MLE simulation results for the MSMD model with 
Burr and generalized gamma distributions of the innovations. The results are qualitatively similar to the 
exponential and Weibull cases and show that the ML estimator works well even when the innovations 
are drawn from multi-parameter distributions. 





n-1 
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estimator in the binomial MSMD model, we find that MLE delivers accurate and almost 
unbiased estimates for both exponential and WeibuU specifications; the simulated stan- 
dard errors scale with yjn as dictated by asymptotic theory. As expected, the Whittle 
estimator is less precise than MLE, and it also entails a significant bias in samples smaller 
than 5,000 observations, particularly for the parameter h. The bias, however, disappears 
in large samples, and the standard errors also scale with -y/n. 

4 Competing duration models 

Models in the duration literature mimic those in the stochastic volatility literature, and 
might be similarly divided into observable or GARCH-type models and latent factor or 
Stochastic Volatility (SV)-type models. The Autoregressive Conditional Duration (ACD) 
model of Engle h Russell (1998) is a member of the former class, and was extended by 
Jasiak (1998) to the Fractionally Integrated ACD (FIACD) model to incorporate long 
memory. Bauwens and Veredas' (2004) Stochastic Conditional Duration (SCD) model 
is a latent factor model, and was modified by Deo, Hsieh h Hurvich (2006) to create the 
LMSD model by letting the latent factor follow a long-memory process. 

It is beyond the scope of this paper to review and compare all existing durations 
models; we refer the reader to a survey by Pacurar (2008). Since we are interested 
in modeling and forecasting persistent durations, we focus here on those models that 
can capture slowly decaying autocorrelations. As noted by Deo et al. (2010), the FIACD 
model is not a long-memory model in the usual sense, as it has infinite mean and hence the 
autocorrelation function does not exist. We are therefore left with the LMSD model as 
the only genuine long-memory duration model with well-behaved moments. To assess the 
benefits of using the relatively more complicated MSMD and LMSD models in practice, 
we also compare their performance with the short- memory ACD model of Engle & Russell 
(1998). 

4.1 The ACD model 

Engle and Russell (1998) suggest that the durations, Xj, obey the following process 
abbreviated as ACD(p,g): 



£i ~ D((7|) 
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where w, and /3j are parameters to be estimated, ipi is the conditional duration, the 
conditional mean of Xi i.e. Ej_i(xj) = ^i, and Si is the duration innovation having a 
distribution with positive support. Sufficient conditions for positive durations arc that 

q p 

u > 0, aj > and /3j > 0. Weak stationarity is guaranteed by ^ ctj + ^ /3j < 1. Over- 

i=i i=i 

all, the model specification is similar to a GARCH model, except that the conditional 
mean is being modelled as opposed to the conditional volatility. The autocovariance 
function of the ACD model decays exponentially, thereby not enabling long memory 
which is signified by hyperbolic decay. 

The ACD models can be estimated using maximum likelihood, given the distribution 
of the disturbance term. Engle & Russell (1998) propose the exponential and Wcibull 
distributions, while Grammig &: Maurer (2000) suggest the Burr distribution and Lunde 
(1999) the generalized gamma distributions. An attractive property of the exponential 
distribution is that the maximum likelihood estimator has a QMLE interpretation, akin 
to the MLE of GARCH model under normality. Forecasting in the ACD model proceeds 
via the ARMA representation (see Engle & Russell, 1998 for details). 

4.2 The LMSD model 

Bauwens & Veredas (2004) propose the the Stochastic Conditional Duration (SCD) 
model given by: 

Xi = eie^\ £i~D(0,) (4.1) 

= co + Pi^i-i + Ui, Ui'^l^{0,al) (4.2) 

where Si and Uj are independent and u and /3 are parameters to be estimated. Unlike the 
ACD model, no conditions on parameters are required to ensure positive durations. Also, 
weak stationarity is guaranteed as long as /3 is less than 1, which is a simpler condition 
than for the ACD model. Overall, the model specification is similar to a stochastic 
volatility model. 

While the ACD has only one, observable random variable driving the system dynam- 
ics, the SCD model has an observable random variable driving the observed duration and 
a latent random variable, Ui, driving the conditional duration (now e^') via an AR(1) 
process. The extra random variable enables a richer dynamics structure: Bauwens Sz 
Veredas (2004) point out that the parameters governing dispersion (cr) and persistence 
(/3) are separated under the SCD model, whereas they are the same in the ACD model 
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(a + /3), so enabling the SCD model to fit a greater variety of persistence-dispersion 
profiles. 

As with the ACD model, the SCD model is only capable of generating geometric 
decay in the autocovariance function. In order to enable long memory, Deo et al. (2006) 
introduce the LMSD process, in which the logged conditional duration equation is re- 
placed with: 

V'i = u; + (3iJi-i + (1 - L)-% 

Here there is more persistence because the logged conditional duration equation has 
changed from an AR(1) process to an ARFIMA process. 

Estimation of the SCD and LMSD models is less straightforward owing to the unob- 
servable factor. Bauwens & Veredas (2004) advocate employing the Kalman Filter, while 
Deo et al. (2006) suggest QMLE using the Whittle approximation. We adopt the latter 
approach here. The Whittle estimator of the parameters is consistent and asymptotically 
normal. Forecasting the SCD and LMSD models is possible either through calibration 
of the best linear predictor, as advocated by Deo et al. (2010), or via the Kalman Fil- 
ter. While the LMSD process contains an infinite series of coefficients, it is still possible 
to create a state-space form as observed by Chan & Palma (1998) and we adopt their 
approach here. 

5 Relation to counts and realized volatility 

Deo et al. (2009) and Deo et al. (2010) recently investigate the propagation of memory 
of durations to counts and thereby realized volatilitjj^ They show that if durations have 
long (short) memory, then under certain conditions the counts have long (short) memory 
as well. To fix notation, recall that ti denotes the time of the i-th event, Xi = ti — tj_i is 
the duration between two consecutive events, and let N{t) denote the counting process 
that counts the number of events that have occurred up to time t. 

In more detail, counts and durations are stationary under different measures, since 
they define the irregularly-spaced event process (a point process) in terms of different 
sets of events. We refer to these measures as and P, respectively. As illuminated 
by Deo et al. (2009), the relevant measure depends on how N(t) is calculated: if it is 
calculated from the opening of the market on a given day, the relevant measure is P^, 
while if from the first event on that day, the relevant measure is P. Since most assets 
^See ? for a review of the literature on realized volatility. 



17 



tend to be heavily traded after market opening, the difference may be empirically small. 

By making use of equivalence theorems (e.g. ?, 1989), Deo et al. (2009) establish 
conditions under which memory propagates from durations to counts, then to squared 
returns and realized volatility. In particular, they show that under certain conditions, the 
short memory of durations generated by a stationary ACD model implies short memory 
in the associated counts and realized volatility, while the long memory of durations in 
the LMSD model implies long memory in counts and realized volatility. With respect 
to the MSMD process now, the following proposition establishes the conditions under 
which the short-memory feature of the MSMD (for finite k) translates into short memory 
in the induced counts. 

Proposition 1 Let {xi} be an MSMD process with finite k, E(M^+^) < oo an(iE(e^+'') < 
oo for some r > 0. Then the induced counting process N(t) satisfies Varjv(Ar(t)) ~ ct 
for some c < oo, where Varjv denotes the variance under P^. 

To link the counts and realized volatility, we follow Deo et al. (2009) and employ the sim- 
ple continuous-time pure-jump model of Oomen (2006). The logarithmic price process, 
p{t), is assumed to have the following dynamics: 

N{t) 

pit)=piO) + ^^„ C/'i^'N(0,a|), (5.1) 

where N(t) is the counting process defined above and is the size of the j-th jump. A 
natural measure of variation in the model is the quadratic variation given by: 

m 

The quadratic variation can be estimated consistently by realized variance. Dividing 
the time interval [0,t] into n non-overlapping intervals of length 6t = t/n, the realized 
variance is defined as: 

n 

RVt,n = Y.^p{i/n) - p{{i - l)/n)f. (5.2) 

i=l 

It follows from Deo et al. (2009) that for the MSMD process satisfying the assumptions 
of Proposition 1, the realized volatility is a short-memory process. 

It is difficult to derive analytically the autocorrelation functions of counts and realized 
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volatility induced by the MSMD process and its competitors. We therefore proceed by 
simulation. For each duration model, we simulate a trajectory of the induced counting 



process N{t) and via (5.1) a trajectory of the associated logarithmic price process p{t). 
From the simulated price process we then calculate a time series of daily realized variance 



according to (5.2), where we define one day to have 6.5 hours, or 23,400 seconds. For 
all duration models, we set the unconditional mean of durations equal to 2 minutes, so 
that there are around 195 price changes on a typical day in the simulation. The price 
innovations, are drawn randomly from the normal distribution with zero mean and 
variance o"| = 1/195, implying that the average daily realized variance is around 1%. 
Finally, to facilitate comparison we calibrate the parameters of the duration models so 
that they share the same first-order autocorrelation coefficient, which we set equal to 
0.45; see the caption of Figure [3] for the exact parameters values used in the simulations. 

Figure [3] plots the theoretical autocorrelation functions of the ACD, MSMD and 
LMSD durations and their corresponding simulated autocorrelation functions for real- 



ized variance as implied by model (5.1). The figure clearly illustrates how memory 
propagates from durations to realized volatility. The short-memory ACD model gener- 
ates realized variance with little persistence, while the long-memory LMSD generates a 
highly persistent realized variance. The MSMD model is capable of generating both: 
when the number of multipliers is small {k = 4), the autocorrelation function of real- 
ized variance decays very quickly despite the ACF of durations being quite persistent. 
Increasing the number of multipliers to 8, the persistence of realized volatility increases 
dramatically and its ACF now clearly exhibits long-memory features. Thus, despite 
being short-memory, the MSMD model is capable of generating both highly persistent 



durations as well as highly persistent realized volatility in the pure jump model (5.1). 



6 Data Description 

We now apply the MSMD model and its competitors to price durations of three major 
foreign exchange (FX) futures contracts traded on the Chicago Mercantile Exchange 
(CME). Our dataset includes all transactions for the Swiss Franc (CHF), Euro (EUR) 
and Japanese Yen (JPY) futures contracts between 9 November 2009 and 29 January 
2010. The data is supplied by TickData, Inc. We focus on the most liquid (front) 
contracts and restrict attention to the main CME trading hours of 7:20 - 14:00 Chicago 
time. US and UK Bank holidays are discarded. 

Price durations are defined as the minimum time it takes for the price to move by a 
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certain amount. We construct them from the transactions durations, which are simply 
the durations between successive trades, by a process called thinning. Due to microstruc- 
ture frictions, such as bid-ask bounce, the price durations may be more informative about 
the underlying prices process and its volatility as thinning reduces the distortions due to 
microstructure noise and eliminates duplicate prices, that is transactions with zero price 
changes. Also, Engle & Russell (1998) show that price durations are closely related to 
the instantaneous volatility: low price durations imply high instantaneous volatility of 
the underlying price process, and vice versa. 

Correspondingly, we construct the price durations by successively measuring the 
minimum time required for the futures price to move by at least c, starting from the first 
transaction on each day and discarding overnight durations. The FX futures contracts 
are highly liquid and usually trade with a tight bid-ask spread of 1-2 ticks, where the tick 
size equals 0.0001 for CHF, EUR and 0.01 for JPY. To eliminate spurious price changes 
due to the bid-ask bounce, we set c = 0.0003 for CHF and EUR and c = 0.03 for JPY. To 
facilitate comparison across the different currencies, we work with the first 12,000 prices 
durations for each FX futures contract available in our sample period. The sample size 
is therefore kept fixed at 12,000 but the sample period varies across the three data sets, 
though they all start on the 9th November 2009. 

Table [2] reports the descriptive statistics for the FX futures price durations data. The 
mean of the price durations is 118s, 106s and 90s for CHF, EUR and JPY, respectively, 
while the median is around half the mean at 66s, 59s and 43s, respectively, indicating 
that the distributions of the price durations are heavily positively skewed. The minimum 
price duration equals Is for all currencies, while the maximum price duration reaches 
42 minutes, 1 hour and 53 minutes, respectively. Consistent with previous empirical 
evidence, we find that the distribution of price durations exhibits over-dispersion, i.e. 
the standard deviation of the price durations significantly exceeds the mean by a factor 
of 1.351, 1.337 and 1.512 for CHF, EUR and JPY, respectively. 

It is well-known that the trading activity in most financial markets varies considerably 
over the course of the day, see e.g. Engle & Russell (1998) who note a hump-shaped 
pattern for transaction and price durations of individual stocks traded on the New York 
Stock Exchange (NYSE), with relatively shorter durations at the start and end of the 
trading day, and longer durations during lunchtime. Consequently, the duration process 
contains a significant seasonal component that has to be accounted for when estimating 
a duration model. 

There are in principle two ways to do that. First, by incorporating seasonality into 
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the duration models directly and estimating the seasonal parameters jointly with the 
dynamic parameters of the duration process (Rodriguez-Poo, Veredas & Espasa, 2007). 
Alternatively, one can first estimate the seasonal component semi- or non-parametrically 
and fit the duration model to the seasonally-adjusted durations (e.g. Engle &: Russell, 
1998, and Fernandes & Grammig, 2006 among many others). Engle (2000) notes that the 
large sample sizes typically available in empirical work make the loss of efficiency of the 
two-step procedure relatively small. Given the complexity of the duration models we are 
considering in this paper, we opt for the two-stage approach and employ nonparametric 
regression (the Nadaraya- Watson estimator) to estimate the seasonal component of the 
price durations, separately for each day of the week as in Bauwens &: Veredas (2004). 

The estimated intraday seasonal patterns are reported in the top panel of Figure 
|4j The diurnal pattern is relatively stable across the days of the week and currencies 
up to around 11:00 Chicago time. During this period the U.S. and European trading 
hours overlap and trading activity in the market is at its peak. After 11:00, trading in 
London, where a large proportion of global FX trading takes place (King, Osier & Rime, 
2011), gradually ceases and the average price durations become progressively longer. 
The exception is Wednesdays, for which we observe a significant dip in the average price 
durations around 13:30, most likely due to elevated volatility surrounding macroeconomic 
announcements. 

Figure |4] plots the autocorrelation function of the adjusted durations obtained by di- 
viding the raw durations by the estimated intraday component. Clearly, the persistence 
in the price durations is not induced by the seasonal component. The descriptive statis- 
tics for the adjusted durations are reported in Table [2| The mean is, by construction, 
close to one, the median remains significantly lower than the mean, and over-dispersion 
is slightly attenuated by the adjustment. The empirical densities of the standardized 
durations, estimated in Figure |4] by a boundary-corrected kernel estimator, are non- 
monotonic and heavily positively skewed. 

7 Empirical Results 

The following section compares the estimation and forecasting performance of our MSMD 
model to the competing ACD and LMSD models. We use the first 10,000 observations for 
estimation and in-sample specification tests and reserve the remaining 2,000 observations 
for evaluating out-of-sample forecasting performance. As is common in the durations 
literature, in the rest of the paper we work exclusively with the seasonally-adjusted 
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durations. Since the mean of the standardized durations is, by construction, close to 
one, we impose this restriction in aU models and do not report the (restricted) estimates 
of the various constant terms (■0 in the MSMD model and uj in the ACD and LMSD 
models) . 

7.1 Estimation results 

We start by describing the in-sample estimates of MSMD for the three currencies in 
our sample. We estimate the MSMD model with k = 4, 6 and 8 multipliers; increasing 
the number of multipliers beyond 8 does not improve the in-sample and out-of-sample 
performance of the model. We use exact maximum likelihood to estimate the MSMD 
model with binomial multipliers and the Whittle estimator for both the binomial and 
log- normal multipliers. All models are estimated with either the exponential or the 
Weibull distribution of innovations. Since the MSMD parameter space is not compact 
(0 < mo <1,A>0, 6>1 and < 7^ < 1) some constraints are generally required to 
achieve numerical stability of the optimization routines. For both MLE and Whittle esti- 
mation, we use the MaxSQP function in the Ox language of ? to maximize the respective 
objective functions and search over the following parameter space: mo G [1.001, 1.999], 
A e [0.001,10], b e [1.001,10] and e [0.001,0.999]. Standard errors for the Whittle 
estimates are obtained by the usual sandwich method for extremum estimators (e.g. ?, 
2003), while for MLE we use the usual inverse negative Hessian. 

Tables [3j |4] and [5] show the estimation results for the CHF, EUR and JPY, respec- 
tively. All estimated parameters have reasonable standard errors. The goodness-of-fit 
test of Chen &: Deo (2004) strongly rejects the null hypothesis of correct model specifica- 
tion for all MSMD models with exponentially distributed innovations. This is generally 
true for all fc's, and across currencies. On the contrary, both the binomial and the log- 
normal MSMD models with Weibull innovations seem to be correctly specified as we can 
not reject the null hypothesis at the 5% level for any of the estimated models. In addi- 
tion, the log-likelihood is uniformly higher for the binomial MSMD models with Weibull 
innovations. Thus, the additional flexibility of the Weibull distribution seems to improve 
the in-sample fit of the MSMD models significantly. 

Turning to the number of multipliers, we find that the log-likelihood increases with 
increasing k in all MSMD models with Weibull innovations. In the case of exponential 
innovations, the models with six multipliers yield the highest log- likelihood. We have 
initially experimented with a wider range of values of k and found that going beyond 8 
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multipliers offers little improvement in terms of both in-sample as well as out-of-sample 
performance, while reducing k below 4 diminishes performance considerably. The results 
are available upon request. 

Comparing the MLE and Whittle parameter estimates for the binomial MSMD spec- 
ifications, we find that the latter are typically smaller than the former, but generally 
exhibit a similar pattern. Specifically, both the MLE and Whittle estimates of b tend 
to decrease with increasing k, while the estimates of 7^ tend to increase. Intuitively, 
holding all parameters fixed, increasing the number of multipliers increases the persis- 
tence of the MSMD process (see Figure [2]j^d)), and hence to fit a given persistence in 
the data the parameters b and 7^ must fall and/or rise, respectively, to compensate (see 
Figure [2](b)). Additionally, we observe that the estimates of uiq fall with increasing k, 
in order to compensate for the increase in unconditional variance of the MSMD process 
associated with rising k (see equation ( |2.7[ )). A similar pattern is found for the param- 
eter A in the specification with log- normal multipliers. Note that it is not surprising to 
find that the Whittle estimates of b, 7^ and k (when applicable) are the same across the 
binomial and log-normal specifications; the two model spectral densities only differ in 



the parametrization of Var(logM), see equation (3.2). This does not, however, imply 
that the linear forecasts of durations obtained from these models will be the same. The 
linear forecasts of durations depend on E(M^) and Var(M) (see equations (2.7), (2.8) 



and (3.8)), and the fact that Var(logM) is the same across the binomial and log- normal 
specifications does not imply that E(M^) and Var(M) are as well. This will generally be 
the case whenever the parameter estimates are obtained by implementing the Whittle 
estimator on non- linearly transformed durations (logs in the present application), rather 
than the durations themselves. 

Having estimated the MSMD model, we now turn to the competing duration mod- 
els. Table [6] shows the results from estimating the exponential and Weibull ACD and 
LMSD models for the three FX futures price durations. All estimated parameters have 
reasonable standard errors. The ACD model with Weibull innovations achieves higher 
log-likelihood than the ACD model with exponentially distributed innovations, but none 
of these models generate higher log-likelihoods than the corresponding binomial MSMD 
models estimated by maximum likelihood. The ACD parameter estimates are qualita- 
tively similar (relatively high /3 and small a) and imply very high persistence as (a + (3) 
is close to one. High persistence is also implied by the LMSD parameter estimates, 
where the long memory parameter estimates (d) lie between 0.37 and 0.50. It is difficult 
to assess the relative in-sample fit of the Weibull and exponential LMSD specifications. 
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since the Whittle quasi-hkehhoods are not directly comparable. 
7.2 Out-of-sample forecasting performance 

Our main interest lies in relative forecasting performance rather than in the in-sample 
fit of the various duration models. As we experiment with alternative estimation meth- 
ods (MLE vs. Whittle) and forecasting schemes (optimal vs. linear), we are really 
going to be comparing alternative forecasting methods rather than models (?, 2006). 
The goal is to shed light not only on the relative ability of the alternative models to 
capture persistence in the data, but also on the impact of parameter uncertainty and 
the choice of forecasting rule on relative predictive performance. Specifically, we com- 
pare the following methods: (a) optimal forecasts from binomial MSMD(6) or MSMD(8) 
models estimated by maximum likelihood; (b) linear forecasts from binomial and log- 
normal MSMD(6) or MSMD(8) models estimated by the Whittle estimator; (c) Kalman 
filter-based forecasts from the LMSD model estimated by the Whittle estimator; and 
(d) ARMA representation-based forecasts from the ACD model estimated by maximum 
likelihood. We also experiment with equally- weighted combinations of (a) and (c), and 
(b) and (c), as model averaging may help reduce model uncertainty. 

We compute and evaluate one step ahead and cumulative 5, 10 and 20 step ahead 
forecasts of price durations. The cumulative /i-step ahead forecast, which we denote by 
Xn,hj 'J-re obtained from the usual multi-step ahead forecast by x^^h — X/7=i -^n+jln- 

Thus, 

Xn,h forecasts the time it takes for h price changes to occur, as opposed to which 
forecasts the time elapsed between the (h — l)-th and h-th price changes. We focus 
on the cumulative forecasts as they are more interesting in applications, for example 
in predicting realized variance. We evaluate the accuracy of the forecasts using two 
common loss functions, the mean square error (MSE) and the mean absolute deviation 
(MAD), and assess the differences between models statistically by the Diebold & Mariano 
(1995) test for equality of forecast accuracy; the Newey-West estimator is used in the 
denominator of the Diebold-Mariano test statistic to account for autocorrelation in the 
multi-step forecasts. Our benchmark against which we assess the MSMD and LMSD 
models is the short-memory ACD, and we compare models with exponential and Weibull 
innovations separately. 

Tables [7] and [8] report the results of the forecasting performance of the different 
methodologies. Although there is no uniform ranking across the currencies, forecast 
horizons and loss functions, a few clear patterns emerge from the exercise. Both the 
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LMSD and MSMD forecasts generally outperform the ACD forecasts in terms of both 
the MSE and MAD. The gains in forecasting performance increase with the forecast 
horizon and are generally statistically significant at the 5% level. The MSMD model 
performs better when the parameters are estimated by maximum likelihood and the 
optimal forecasting rule is used, but the linear forecasting scheme coupled with parameter 
estimates obtained by Whittle estimation also deliver better performance than the ACD, 
although the difference is not always statistically significant. The superior in-sample fit 
of the models with Weibull innovations that we documented in the previous section 
does not necessarily translate into better our-of-sample performance. Similarly, while 
the MSMD (8) has a higher log- likelihood in-sample, it does not always outperform the 
MSMD (6) specification. 

The LMSD and MSMD models generally perform on par if optimal forecasting and 
MLE estimates are used for the latter model, with the MSMD sometimes producing 
slightly better results. The forecast combinations of the LMSD and MSMD models 
almost always significantly outperform the ACD model, and this is generally true re- 
gardless of the estimation method and forecasting rule used for the MSMD model. This 
is a potentially important result for practitioners, for the Whittle estimators of both 
MSMD and LMSD parameters are very easy to implement regardless of the size of the 
sample or the various distributional assumptions made. Thus, we conclude that the 
long-memory duration models do provide better forecasts than the simple short-memory 
ACD model. 

8 Conclusion 

This paper introduces a new model for financial durations, featuring persistence that 
translates from durations to realized volatility. We establish the main properties of the 
model and propose the Whittle estimator of its parameters as an alternative to maximum 
likelihood. In an empirical application, we show that the MSMD model performs well in 
multi-step forecasting. 

There are several avenues for future research. It would be worthwhile to explore 
rigorously the asymptotic properties of the Whittle estimator, and to experiment with 
the "enhanced" Whittle estimator proposed by ? in order to improve the finite-sample 
properties. The idea of this approach is to apply the Whittle estimator to durations 
transformed as ^x'', f > 0, rather than to logarithmic durations as we did in this paper, as 
the distribution of ^x^ may be closer to Gaussian for some v than the distribution of log x. 



25 



? show that this transformation is smooth in the sense that the autocovariance function 
of ^x'" approaches the autocovariance function of logx as u — t- 0. The moments and 
spectral density of ^x"" can be obtained in closed form, which facilitates implementation. 

On the empirical side, it would be interesting to use the MSMD model in various 
risk-management applications. Given the success of the model in multi-step forecasting, 
one may for example explore its ability to forecast realized volatility over short time- 
horizons, such as 1 hour, and compare the resulting forecasts with those obtained from 
popular time-series models for realized volatility. Similarly, the model may be fit to 
volume durations, and used to predict market trading activity with the aim to optimally 
time trade execution. We will explore these applications in future work. 
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A Mathematical appendix 



A.l Derivation of autocovariance functions in (2.8) and (3.2) 

Defining nijj — f(Mj^i) for some function / : R — > R such that E(/^(Afj,i)) < oo, we start by showing 
that: 

E{mj^imj^i-h) = E(mj,i)^ + Var(mj,i)(l — 7^)''. (A.l) 
for h > 0. In the binomial MSMD model, the multiplier Mj^i, if it switches, takes the value of mo or 



(2 — mo) with equal probability. To simplify notation, define pj 



|7j, mo,i ~ /(mo), mo,2 



/(2 — mo), mo := (mo,i, mo, 2)'- Then the transition matrix, Pj, associated with the j-th multiplier can 
be written as: 



Pj 1 - Pj 

1 - pj pj 



V2 V2 
2 2 

2 2 



1 

2(p,-l) 



V2 V2 
2 2 



where C is the matrix of eigenvectors of the transition matrix and Aj holds the corresponding eigenvalues. 
Then by the Law of Iterated Expectations (LIE), 



E(mj,imj,i_f,) 



E(mj,imj,i_h|mj,i_h = mo,i)P(mj,i_h = mo,i) 

+ E(mj,imj,i_h|mj,i_h = mo,2)P(mj,i_h = mo, 2), 

= ^mo,iE(mj,ijmj,i_h = mo,i) + ^mo,2E(mj,i|mj,i_h = mo, 2), 

1 I Ah f-tl 

= -ruoCAjC mo, 

= ^("^0,1 + rnofif + ^(1 - lj)*'{mo,i ~ mo, 2)^, 

= E(mj,,)^ + Var(mj,0(l - ijf ■ 

When the multiplier Mj.i is drawn from a continuous distribution upon switching, then the new value it 
takes is different from the current value with probability one. Then we have: 

E(mj,imj,i_h) = E[E(mj,imj,i_h|mj,i_h), 

= E[mj,i_hE(mj,ijmj,i / mj,i_h)P(mj,i / rrij^i-h) 

+ mj,i_hE(mj,ijmj,i = mj,j_h)P(mj,i = mj,,_h)] 
= E[mj,i_h(E(mj,i)(l - (1 - Tj)*") + mj,i-h(l - 7j)'')] 
= E(m,,,)'(l - (1 - 7.)') + E(m,^,,)(l - 7,)\ 
= E(mj,i)^ + Var(mj,i)(l — 7^)'', 

as claimed. 

Now given that the multipliers and are all mutually independent, we obtain by LIE and (A.l I for 
h>0: 

Cov (xt,Xt-h) = Cov{^ipiei,i>i-hei-h), 

= E{iJiiPi-hMei)E{€i-h) - E(?/>,)E(t/',-h)E(e,)E(e>-h), 
fe / fc 

nE(M,,,M,,,_,)- nE(^^^-) 



k 



j=l 



V-'l n[l + Var(M)(l-7,)'^]-l 
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and for h = 0: 

Var(a;0 = E(V?)E(6?) - E(V'0'E(eO', 
= V''[E(M2)'=E(e?)-l], 

as claimed. 

Turning to the autocovariance function for logarithmic durations, taking logs of both sides of Xi, we 
have: 

k 

log Xi = log + ^ mj,i + log ei, 
j=i 

where mj,i — log Mj^i. Given that the multipliers and et are all independent, we obtain by ( |A.1| ) for 
Cov(loga;i,loga;i_h) = ^ Cov(mj,i, nijA-h) = Var(logM) ^(1 - 7j)'') 

and for /i = 0: 

Var(loga::i) = ^ Var(mj,i) + Var(logei) = fcVar(logM) + Var(logei), 

as claimed. 



A. 2 Derivation of the spectral density in (3.2) 

The spectral density then follows directly by calculating the discrete Fourier transform of the autoco- 
variance function: 

^ oo 

ZTV ^ — ^ 

/i — — oo 

h — — oo \j — 1 

- ^^f^ + i- E (var(logM)E(l-7.)"' 

h— — oG \ J — 1 



Var(loge,) _^ Var(logAf) 



1-(1~7.)' 



2tv 



2tt 



2 l + (l-7,)^-2(l-7,), 



where we use the well-known fact that for any p £ (—1,1), 

oo 



1-p^ 



1 + p2 — 2pC0SLJ 



A. 3 Proof of Proposition 1 

We need to show that the assumptions of Theorem 1 in Deo et al. (2009) are satisfied. Assumption (ii) 
requires the duration process to be exponential strong mixing, and this was established in Section 2.1; it 
is well-known that exponential /3-mixing implies exponential strong mixing since the /3-mixing coefficients 
dominate the strong mixing coefficients, e.g. Davidson, 1994, Chapter 14. Turning to Assumption (i) of 
Theorem 1 in Deo et al. (2009), assume if) = 1 without loss of generality and calculate: 
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n n — i 

= E(vi?)E(e?) + -^^E(7/,.^.+h)-n, 
n ^ — ' ^ — ' 
,;=1 fi=i 

n n — i fc 

= (1 + Var(M))'=(l + Var(eO) + ^Y.Y.Il{^ + Var(A/)(l - 7,)'^ 



i = l h = l ; = i 



where the last hne follows from ( A.l \ by taking f(x) = x. Now 

_ 71 n — i k „ n n — i k 

„EEn(l + Var(M)(l-70') < ^EEn(l + Var(M)(l-7)'), (A.2) 

i=l h^l 1^1 i=l h=l /—I 

^EE(l + Var(M)(l-7)'^) , 

Z = l /i^l 

n n — i / fc — 1 /, \ 

^EE l + E ' [Var(M)(l-7)" 

1=1 h=l V p=0 V/ 

^EE(l + (l-l)''E(')var(M)* 

71 n — i 

^EE(i + (i-i)S' 



1=1 h=l 



= + (A.3) 

7 

where is a strictly positive constant. Hence ;^Var [Y17=i converges. It is easy to see that the term 
on the left-hand side of ( |A.2[ ) is bounded from below by n — 1 for all n. It follows that the limit as n — >■ 00 
of ^Var (X]r=i strictly positive. Denote the limit by and define Yn{s) — n^^^^ X^it^^ {xi — i)). 

Since by assumption ei is nd with 3 + r moments finite for some r > and independent of i/'ij which 
also has 3 + r moments finite, it follows that Xi has 3 + r moments finite. Then by Deo et al. (2009), 
Yn => aW, as n — > 00, where is a standard Brownian motion. This verifies Assumption (i) of Theorem 
1 in Deo et al. (2009). 

Finally, by the same argument as in the proof of Theorem 3 in Deo et al. (2009), the exponential 
strong mixing property and the existence of 3 + r (where r > 0) moments of Xi imply the uniform 
integrability property required by Assumption (iii) of Theorem 1 in Deo et al. (2009). ■ 
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Table 1: Monte Carlo simulation of the maximum likelihood (MLE) and Whittle es- 
timators of the parameters of the MSMD model with k = 8 binomial or log-normal 
multipliers and exponential or Weibull innovations. We report average parameter es- 
timates obtained in the simulation together with standard errors in parentheses. The 
true parameters used in the simulations are b = 2, = 0.5, mo = 1.4, A = 0.15 and 
K = 1.45. The results for MLE are based on 500, 200 and 100 replications for the samples 
of n = 1, 000, 2, 000 and 5, 000 observations, respectively. All simulations of the Whittle 
estimator are based on 1,000 replications. 
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Table 2: Descriptive statistics for Swiss franc (CHF), Euro (EUR) and Japanese Yen 
(JPY) futures price durations. The sample period runs between 9 November 2009 and 
29 January 2010. 
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Tabic 3: MSMD parameter estimates for de-seasonaliscd CHF price durations. (A) Max- 
imum likelihood estimates (MLE) of the binomial MSMD models with exponential and 
Weibull innovations, (B) Whittle estimates of the binomial MSMD models with exponen- 
tial and Weibull innovations and (C) Whittle estimates of the log-normal MSMD models 
with exponential and Weibull innovations. Standard errors are reported in parentheses. 
The specification test r„ is reported with p-values in parentheses. 
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Table 4: MSMD parameter estimates for dc-seasonalised Euro futures price durations. 
(A) Maximum likelihood estimates (MLE) of the binomial MSMD models with expo- 
nential and Weibull innovations, (B) Whittle estimates of the binomial MSMD models 
with exponential and Weibull innovations and (C) Whittle estimates of the log-normal 
MSMD models with exponential and Weibull innovations. Standard errors are reported 
in parentheses. The specification test is reported with p-values in parentheses. 
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0.569 


0.723 


0.784 




(0.014) 


(0.016) 


(0.018) 


(0.079) 


(0.170) 


(0.183) 


K 








1.396 


1.422 


1.430 




(-) 


(-) 


(-) 


(0.031) 


(0.063) 


(0.073) 


T 


49.062 


49.129 


49.138 


0.757 


0.748 


0.745 




(0.000) 


(0.000) 


(0.000) 


(0.225) 


(0.227) 


(0.228) 


q-log L 


-2501.523 


-2501.468 


-2501.450 


-2883.415 


-2883.466 


-2883.507 


C. Log-normal multipliers - Whittle 








A 


0.136 


0.088 


0.065 


0.241 


0.160 


0.119 




(0.012) 


(0.008) 


(0.007) 


(0.017) 


(0.016) 


(0.012) 


b 


1.991 


1.560 


1.391 


3 Q54 


9 574 


2 046 




(0.315) 


(0.157) 


(0.125) 


(0.422) 


(0.284) 


(0.185) 


Ik 


0.084 


0.091 


0.095 


0.569 


0.723 


0.784 




(0.014) 


(0.016) 


(0.011) 


(0.079) 


(0.170) 


(0.183) 


K 








1.396 


1.422 


1.430 




(-) 


(-) 


(-) 


(0.031) 


(0.063) 


(0.073) 


T 


49.061 


49.128 


49.514 


0.757 


0.748 


0.745 




(0.000) 


(0.000) 


(0.000) 


(0.225) 


(0.227) 


(0.228) 


q-log L 


-2501.523 


-2501.468 


-2501.347 


-2883.415 


-2883.466 


-2883.507 



Table 5: MSMD parameter estimates for de-scasonalised Japanese Yen futures price du- 
rations. (A) Maximum likelihood estimates (MLE) of the binomial MSMD models with 
exponential and Weibull innovations, (B) Whittle estimates of the binomial MSMD mod- 
els with exponential and Weibull innovations and (C) Whittle estimates of the log-normal 
MSMD models with exponential and Weibull innovations. Standard errors are reported 
in parentheses. The specification test is reported with p-values in parentheses. 
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ACD 
Exp >V(k) 



LMSD 
Exp W{k) 



A. Swiss Franc 



ce 


U.iOz 


U.iOi 










(0.005) 


(0.000) 




( - ) 


( - ) 


P 


U.ooU 


U.ooo 




U.o (i) 


-U.U4o 




(0.007) 


(0.000) 




(0.043) 


(0.099) 


J 

a 








U.oUU 


n A 77 
U.4/ / 




( - ) 


( - ) 




(0.074) 


(0.033) 










U.UUo 


U.ooz 




(- ) 


( - ) 




(0.001) 


(0.143) 


K 




u.yyo 






i.oiy 




( - ) 


(0 002) 




( - ) 


(0.125) 


logL - 


■8953.300 


-8930.917 


q-log L 


-2928.500 


-3376.700 


B. Euro 












a 


0.157 


U.lOO 










(0.006) 


(0.006) 




(- ) 


( - ) 




0.811 


U.oiz 




U.ooU 


-U.U04 




(0.008) 


(0.007) 




(0.054) 


(0.046) 


d 








U.oUU 


U.oo ( 




(-) 


( - ) 




(0.073) 


(0.037) 










U.UUo 


U.04U 




(-) 


( - ) 




(0.002) 


(0.186) 


K 




i.Uo4 






i.yoi 




(-) 


(0.007) 




( - ) 


(0.369) 


logL - 


•9068.800 


-9058.900 


q-log L 


-3074.600 


-3643.700 


C. Japanese Yen 










a 


0.188 


0.193 










(0.005) 


(0.006) 




(-) 


(-) 


P 


0.780 


0.775 




0.848 


-0.081 




(0.007) 


(0.008) 




(0.043) 


(0.025) 


d 








0.367 


0.373 




(-) 


(-) 




(0.076) 


(0.034) 


-I 








0.020 


1.022 




(-) 


(-) 




(0.004) 


(0.231) 


K 




0.949 






2.570 




(-) 


(0.007) 




(-) 


(1.067) 


logL - 


■8500.100 


-8473.900 


q-log L 


-2515.400 


-2888.600 



Table 6: Maximum likelihood estimates of exponential and Weibull ACD models and 
Whittle estimates of exponential and Weibull LMSD models for de-seasonalised (A) 
Swiss franc, (B) Euro and (C) Japanese Yen futures price durations. Standard errors 
are reported in parentheses. 
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Figure 1: Simulated binomial and log- normal MSMD processes with six multipliers and 
exponentially distributed innovations. The parameters of the processes are 6 = 3, 7^ = 
0.5, mo = 1.4 and A = 0.15. 
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Figure 3: (a) Theoretical autocorrelation functions of durations from i) the ACD model 
with parameters a = 0.24, /3 = 0.69, ii) the binomial MSMD(4) model with parameters 
mo = 1.84, b = 3.30, jk = 0.047, iii) the binomial MSMD(8) model with parameters 
mo = 1.55, b = 3.00, -/k = 0.076, and iv) the LMSD model with u = 1.028, /? = 0.73, 
d = 0.47, (7^ = 0.029. (b) Simulated autocorrelation functions of daily realized volatility 
generated by the corresponding duration models i)-iv). 
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Figure 4: Foreign exchange price durations data. The top row shows the diurnal pattern 
estimated by kernel regression separately for each day of the week. The second row shows 
the time series of standardized durations and the third row reports the autocorrelation 
functions of raw and standardized durations. The bottom row plots the empirical density 
of standardized durations obtained by a boundary-corrected kernel estimator. 
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